Taxonomy induction based on a collaboratively built knowledge repository

نویسندگان

  • Simone Paolo Ponzetto
  • Michael Strube
چکیده

a r t i c l e i n f o a b s t r a c t The category system in Wikipedia can be taken as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexico-syntactic matching. The result is a large scale taxonomy. For evaluation we propose a method which (1) manually determines the quality of our taxonomy, and (2) automatically compares its coverage with ResearchCyc, one of the largest manually created ontologies, and the lexical database WordNet. Additionally, we perform an extrinsic evaluation by computing semantic similarity between words in benchmarking datasets. The results show that the taxonomy compares favorably in quality and coverage with broad-coverage manually created resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collaboratively constructed knowledge repositories as a resource for domain independent concept extraction

To achieve a domain independent text management, a flexible and adaptive knowledge repository is indispensable and represents the key resource for solving many challenges in natural language processing. Especially for real world applications, the needed resources cannot be provided for technical disciplines, like engineering in the energy or the automotive domain. We therefore propose in this p...

متن کامل

Semantic Taxonomy Induction from Heterogenous Evidence

We propose a novel algorithm for inducing semantic taxonomies. Previous algorithms for taxonomy induction have typically focused on independent classifiers for discovering new single relationships based on hand-constructed or automatically discovered textual patterns. By contrast, our algorithm flexibly incorporates evidence from multiple classifiers over heterogenous relationships to optimize ...

متن کامل

SEMI-AUTOMATIC GENERATION OF A CORPUS OF WIKIPEDIA ARTICLES ON SCIENCE AND TECHNOLOGY Generación semi-automática de un corpus de artículos de Wikipedia sobre ciencia y tecnología

Despite the huge amount of scientific and technological content available on the World Wide Web, most of it is closed behind paywalls, as with academic journals, or almost invisible, as with institutional repositories. Wikipedia can act as a chain-transfer agent, providing people with an accessible, organized structure containing both understandable content and links to original sources. In Wik...

متن کامل

Constructing Folksonomies by Integrating Structured Metadata with Relational Clustering

Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently also to organize content hierarchically. These types of structured metadata provide valuable evidence for learning how a community organizes knowledge. For instance, we can aggregate many personal hierarchies into a common taxonomy, also known as a folksonomy, that will aid users...

متن کامل

An On-line SE Repository for Germany’s SME –An Experience Report–

In order to keep pace with competitors in the ever accelerating business world, software organizations have to continuously improve their products and processes. However, SMEs typically do not have the time to invest in costly training programs and test the newest technology on their own. Most of the time they need support for their daily work processes, which often employ agile methods. In thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Artif. Intell.

دوره 175  شماره 

صفحات  -

تاریخ انتشار 2011